Codec integrated voice conversion for embedded speech synthesis
نویسندگان
چکیده
Voice conversion technologies transform individual characteristics of speech patterns while preserving the original content, and can be widely used in speech processing. Considering limited system resources, in particular, of embedded concatenative speech synthesis, voice conversion may reduce the memory consumption of the acoustic database. Voice conversion enables the intra-gender or cross-gender generation of new voices by using an existing high-quality voice. Usually, voice conversion is based on modification of spectral properties in accord with pitch manipulation. Warping functions in the frequency domain aiming at a reverse vocal tract length normalization (VTLN) is a simplified approach. Consequently, voice conversion itself generates a critical calculation complexity which contradicts the practical constraints of typical embedded and mobile applications. The authors propose a novel approach for voice conversion by re-using features of a common speech codec. Such a codec is already available in typical mobile applications and the resulting voice quality is widely accepted. The paper investigates the manipulation of the immittance spectral frequencies (ISF) provided by the Adaptive Multi Rate Wideband codec (AMR-WB). This algorithm has been integrated into the embedded speech synthesizer microDRESS.
منابع مشابه
Evaluation of VTLN-based voice conversion for embedded speech synthesis
Recently, we demonstrated that vocal tract length normalization (VTLN) can be applied to voice conversion tasks. In particular, when the conversion algorithm is performed in time domain, this technique is very resource-efficient and, consequently, suitable for embedded applications. In this paper, we use VTLNbased voice conversion as a novel feature of a small footprint speech synthesizer runni...
متن کاملEmbedded system for speech recognition and image processing
In recent years, the products of voice terminal and image retrieval show the intelligentized trend, but the mature commodities are rare in the market. This paper presents an embedded design method of intelligent voice terminal based on pattern recognition. The design adopts Samsung S3C2410 ARM as target board, Philips Uda1341TS as audio codec, embedded Linux OS as software platform, and speech ...
متن کاملDuration-embedded bi-HMM for expressive voice conversion
This paper presents a duration-embedded Bi-HMM framework for expressive voice conversion. First, Ward’s minimum variance clustering method is used to cluster all the conversion units (sub-syllables) in order to reduce the number of conversion models as well as the size of the required training database. The duration-embedded Bi-HMM trained with the EM algorithm is built for each sub-syllable cl...
متن کاملStudy on Unit-Selection and Statistical Parametric Speech Synthesis Techniques
One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...
متن کاملطراحی یک روش آموزش ناموازی جدید برای تبدیل گفتار با عملکردی بهتر از آموزش موازی
Introduction: The art of voice mimicking by computers, has with the computer have been one of the most challenging topics of speech processing in recent years. The system of voice conversion has two sides. In one side, the speaker is the source that his or her voice has been changed for mimicking the target speaker’s voice (which is on the other side). Two methods of p...
متن کامل